12/03/2019

Motivation

Standard Approach

To make statistical problems more tractable:



\(\cdot\) Common to pool data eg. spatially



\(\cdot\) Partition a region eg. consider region by region

Climate example

National Resource Management Regions

CSIRO and Bureau of Meteorology, 2015. Climate change in Australia information for Australia's natural resource management regions: Technical report.

Post-processing example


Whan, Kirien, and Maurice Schmeits. "Comparing area probability forecasts of (extreme) local precipitation using parametric and machine learning statistical postprocessing methods." Monthly Weather Review 146.11 (2018): 3651-3673.

Question






How should partition regions for the analysis of extremes?

Application

Create regions that are likely to experience similar impacts

Regionalisation

These regions can then inform our statistical analysis

Outline

1. Regionalisation

  • Clustering
  • Dependence of bivariate extremes
  • Practicalities
  • Classification

2. Visualise spatial dependence

  • Max-stable processes

Regionalisation

Clustering Distance



Require: Measure of closeness between two locations


Want: Form clusters based on extremal dependence


Solution: The F-madogram distance





Bernard, Elsa, et al. "Clustering of maxima: Spatial dependencies among heavy rainfall in France." Journal of Climate 26.20 (2013): 7929-7937.

F-madogram distance

\[d(x_i, x_j) = \tfrac{1}{2} \mathbb{E} \left[ \left| F_i(M_{x_i}) - F_j(M_{x_j})) \right| \right]\] where \(M_{x_i}\) is the annual maximum rainfall at location \(x_i \in \mathbb{R}^2\) and \(F_i\) is the distribution function of \(M_{x_i}\).


Advantages:

  • Only use the raw block (annual) maxima
  • No information about climate or topography
  • Non-parametric estimation (fast)


Cooley, D., Naveau, P. and Poncet, P., 2006. Variograms for spatial max-stable random fields. In Dependence in probability and statistics (pp. 373-390). Springer, New York, NY.

Extremal Coefficient

For \(M_{x_i}\) and \(M_{x_j}\) with common GEV marginals is \[\mathbb{P}\left( M_{x_i} \leq z, M_{x_j} \leq z \right) = \left[\mathbb{P}(M_{x_i}\leq z)\mathbb{P}(M_{x_i}\leq z)) \right]^{\tfrac{1}{2}\theta(x_i - x_j)}. %= \exp\left(\dfrac{-\theta(h)}{z}\right),\] where \(\theta(x_i - x_j)\) is the extremal coefficient and the range of \(\theta(x_i - x_j)\) is \([1 , 2]\).

Can express the F-madogram as: where \[d(x_i, x_j) = \dfrac{\theta(x_i - x_j) - 1}{2(\theta(x_i - x_j) + 1)},\] so the range of \(d(x_i, x_j)\) is \([0 , 1/6]\).

Clustering



\(\checkmark\) Distance




\(?\) Algorithm

K-Medoids Clustering and PAM

  1. Randomly select an initial set of \(K\) stations. These are the set of the initial medoids.
  2. Assign each station, \(x_i\), to its closest medoid, \(m_k\), based on the F-madogram distance.
  3. For each cluster, \(C_k\), update the medoid according to \[m_k = \mathop{\mathrm{argmin}}\limits_{x_i \in C_k} \sum_{x_j \in C_k} d(x_i, x_j).\]
  4. Repeat steps 2. – 4. until the medoids are no longer updated.


Kaufman, L. and Rousseeuw, P.J., 1990. Partitioning around medoids (PAM). Finding groups in data: an introduction to cluster analysis, pp.68-125.

Result

Example

Consider the \(\max \{ \| x_i - x_j \|, 2\}\) as the clustering distance.

Density example

Gridded data

Spatial density is changed by land-sea and domain boundaries



Tendancy toward clusters of equal size



Clustering is in F-madogram space not Euclidean

Hierarchical Clustering

  1. Each station starts in its own cluster
  2. For each pair of clusters, \(C_k\) and \(C_k'\), define the distance between the clusters as \[d(C_k, C_{k'}) = \frac{1}{|C_k| |C_{k'}|} \sum_{x_k \in C_k} \sum_{x_{k'} \in C_{k'}} d(x_k, x_{k'}).\]
  3. Merge the the clusters with the smallest distance
  4. Update the distances relative to the new cluster
  5. Repaet steps 3 - 5, until all points are combined in a single cluster

Hierarchical Clustering

Back to the first example

Classify

  • Classify a station relative to its closest neighbours
  • Use a weighted classification \(w\)-kNN

Choosing a cut height

Similar Dependence




Where can we assume a common dependence structure for extremes?

Visualising Dependence

Max-stable Processes

  • Extremes in continuous space with dependence

  • Natural extension from univariate extreme value theory

  • Univariate marginal distributions are GEV distributions

  • Can simulate from these processes

Max-stable processes and post-processing

Oesting, M., Schlather, M. and Friederichs, P., 2017. Statistical post-processing of forecasts for extremes using bivariate Brown-Resnick processes with an application to wind gusts. Extremes, 20(2), pp.309-332.

  • Fit two max-stable fields: observations and numerical model output

  • Model the dependence between these two fields

  • Conditionally simulate observations based on the numerical model output

  • Use these simulations to generate probabilistic forecast of a spatial field

Visualising Dependence

  • Visualise the partition using elliptical level curves

\[ \mathbb{P}(\| \mathbf{x} - \mathbf{c_k} \| < r) = 1 - \exp \left( \frac{-r^2}{2} \right)\]

  • Centre the curve on the centroid \(c_k\)
  • Repeat fitting by sampling stations to understand uncertainty
  • Size and direction of ellipses have a natural intepretation in terms dependence

Southwest Western Australia

Tasmania

Relevance to post-processing

Perfect Prog Approach:
Simulate the ensemble from the statistical model

Assumptions:
The fitted statistical model is the truth

Relevance:
Need to ensure how we model the dependence is accurate

Conclusions

  • Crate hierarchical clustering
  • Identified regions of similar extremal dependence
  • Shown different regions have different extremal dependence
  • Regionalisation helps us understand were we can reasonably assume a single dependence structure

Future work: Non-stationary dependence!

e. K.R.Saunders@tudelft.nl

t. @katerobsau

g. github.com/katerobsau

Conclusions

Conclusions